Use of Part Of Speech (POS) and morphological information for resolving Multiple Pronunciations in Pronunciation Lexicon Specification (PLS) for Indian Languages – Bengali as a Case Study

نویسندگان

  • Shyamal Das Mandal
  • Somnath Chandra
  • Swaran Lata
چکیده

Pronunciation dictionary is one of the important components for the speech technology development for a particular language. This is because it represents the interface between speech analysis on the acoustic level and speech interpretation. The W3C Voice Browser Activity has published a Pronunciation Lexicon Specification (PLS) Version 1.0 [1] for generation of PLS in different languages. This paper proposes some modification of the published PLS specification with respect to Indian languages with Bengali as a typical case study.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Prosodic Features of Punjabi for Enhancing the Pronunciation Lexicon Specification (pls) for Voice Browsing

Voice browsing requires speech interface framework. Pronunciation Lexicon Specification (PLS) 1.0 is a recommendation of Voice Browser Working Group of W3C (World-Wide Web Consortium), a machinereadable specification of pronunciation information which can be used for speech technology development. This global PLS standard is applicable across European and Asian languages and this specification ...

متن کامل

Web-Based Bengali News Corpus for Lexicon Development and POS Tagging

Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing (NLP) applications. The rapid development of these resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. We have used a Bengali news corpus, developed from the web archive of a widely read Bengali newspaper. The ...

متن کامل

Part-of-Speech Tagging and Chunking with Maximum Entropy Model

This paper describes our work on Part-ofspeech tagging (POS) and chunking for Indian Languages, for the SPSAL shared task contest. We use a Maximum Entropy (ME) based statistical model. The tagger makes use of morphological and contextual information of words. Since only a small labeled training set is provided (approximately 21,000 words for all three languages), a ME based approach does not y...

متن کامل

Voted Approach for Part of Speech Tagging in Bengali

Part of Speech (POS) tagging is the task of labeling each word in a sentence with its appropriate syntactic category called part of speech. POS tagging is a very important preprocessing task for language processing activities. In this paper, we report about our work on POS tagging for Bengali by combining different POS tagging systems using three weighted voting techniques. The individual POS t...

متن کامل

Maximum Entropy Based Bengali Part of Speech Tagging

Part of Speech (POS) tagging can be described as a task of doing automatic annotation of syntactic categories for each word in a text document. This paper presents a POS tagger for Bengali using the statistical Maximum Entropy (ME) model. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the various POS cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010